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Abstract —Continuous Goal-Directed Actions (CGDA) is a 
robot imitation framework that encodes actions as the changes 
they produce on the environment. While it presents numerous 
advantages with respect to other robot imitation frameworks 
in terms of generalization and portability, final robot joint 
trajectories for the execution of actions are not necessarily 
encoded within the model. This is studied as an optimization 
problem, and the solution is computed through evolutionary 
algorithms in simulated environments. Evolutionary algorithms 
require a large number of evaluations, which had made 
the use of these algorithms in real world applications very 
challenging. This paper presents online evolutionary strategies, 
as a change of paradigm within CGDA execution. Online 
evolutionary strategies shift and merge motor execution into the 
planning loop. A concrete online evolutionary strategy, Online 
Evolved Trajectories (OET), is presented. OET drastically 
reduces computational times between motor executions, and 
enables working in real world dynamic environments and/or 
with human collaboration. Its performance has been measured 
against Full Trajectory Evolution (FTE) and Incrementally 
Evolved Trajectories (IET), obtaining the best overall results. 
Experimental evaluations are performed on the TEO full-sized 
humanoid robot with “paint” and “iron” actions that together 
involve vision, kinesthetic and force features. 

I. Introduction 

Robot imitation is a large area of study in robotics, which 
focuses on how a robot can learn an action based on user 
demonstrations. Popular frameworks for robot imitation typi¬ 
cally record the trajectories the robot or human performs dur¬ 
ing demonstrations, successfully achieving reproduction of 
the average trajectory by the robot end-effector in Cartesian 
space. The most prominent examples of these frameworks are 
Programming by Demonstration [1] and Dynamic Motion 
Primitives [2], These algorithms shine for actions that are 
governed by geometry, such performing gestures in the air, 
or performing simple manipulation tasks of moving an object 
from A to B. Additional works have been performed to 
introduce active compliance [3] and obstacle avoidance [4]. 

However, a large body of actions that cannot be described 
solely in terms of human or robot geometric trajectories 
exists. In addition to joint or Cartesian positions, visual and 
force features provide relevant information when describing 
actions such as painting or ironing. Regarding visual features, 
recent studies have focused on learning end-to-end mappings 
directly from raw images to the robot joint space [5]. These 
works can involve large sets of images for pre-training, robot- 
environment physical interaction, and additional hours for 
training. 
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Continuous Goal-Directed Actions (CGDA) is a feature- 
agnostic robot imitation framework [6]. Actions are encoded 
as time series of the variation of scalar features extracted 
from sensor data during user demonstrations. While this 
framework provides a rich infrastructure for generalizing 
actions, this advantage comes at a cost. Final robot joint 
or end-effector Cartesian trajectories are not necessarily 
encoded within the model. Their components may have to be 
completely recomputed in order to comply with additional 
goals such as vision or force, or may be discarded manually 
or through automatic feature selection algorithms [7]. This 
recomputation is studied as an optimization problem, and 
has been solved through evolutionary algorithms in simulated 
environments. CGDA requires no previous interaction with 
the environment, or additional training times. 

In this paper, the online evolutionary strategy paradigm for 
CGDA execution is presented. The following contributions 
and consequences result from this change of paradigm. 

• Motor execution has been shifted and merged into the 
CGDA planning loop, enabling online adaptation for 
changing environments. 

• We demonstrate that the total time dedicated to mental 
simulation processes between motor executions is no 
longer dependent on the duration of the action. 

• The order of magnitude of results has been reduced 
from minutes with Incrementally Evolved Trajectories 
(IET) [8] to seconds with the presented online evolu¬ 
tionary strategy. Online Evolved Trajectories (OET). 

The “paint” action. Figure |T| has been used to evaluate 
OET for a pure visual feature action, and an “iron” action 
was used for kinesthetic and force features. 



Fig. 1. Online Evolved Trajectories (OET) allows changes in the environ¬ 
ment during execution, as can be seen with the “paint” action. 







II. CGDA Framework and Strategies 

CGDA is a framework for generalizing, recognizing and 
executing actions based on scalar features extracted from 
sensor data. In CGDA, an action is modelled as a trajectory 
in a feature space of to scalar features, to represent the 
changes it produces on the environment. Scalar features used 
in CGDA, in addition to the geometric trajectory of a specific 
robot or human configuration, may include visual features of 
the environment, forces exerted by a given actuator, or even 
Cartesian positions of moving objects in the environment. 
Achieving a state of the environment in which the scalar 
features extracted from sensor perception match those of a 
given modelled action is studied as an optimization problem 
in the execution stage. The features are used as constraints 
to compute or recompute robot joint trajectories. 

In CGDA not only the goal set of features, but also 
intermediate goals, must be achieved. An action is sliced 
into n intermediate goals, computed as n= [ J, where 
Dtime is the average duration of user demonstrations, and 
Tmin is the minimum time interval between intermediate 
goals. The generalized representation of an action X is 
a trajectory in the TO-dimensional feature space with n 
intermediate goals Xj as defined in ([T}. 


X = 
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Where to may be set manually or via feature selection 
algorithms [7]. Let di be a discrete sample of any user 
demonstration for feature i, then x, :] is computed as in 0 
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Recognition of an action is performed by comparing 
an observed action O with a generalized action X. The 
discrepancy metric used is the sum of costs of aligning 
each feature i cost matrix c p (0,;, Xi). Each cost matrix is 
computed as within Dynamic Time Warping [9] as in (|3j. 


recognition are performed in an internal “mental” simulation, 
where fitness / is the recognition discrepancy. Termination 
conditions are evaluated a maximum of tc times, while 
additionally monitoring the evolution of /. 


Algorithm 1 Full Trajectory Evolution (FTE) 
l: procedure FTE(A') 

2: individuals <—initialize 

3: while not termination-conditions do 

4: for each individual do 

5: U evol ve(DoF-n) 

6: O x— mental_execution(LO 

7: / •<— mental _recognition(0, X) 

8 : end for 

9: end while 

10: motor execution)!/) 

11 : end procedure 


In IE, Algorithm [2] each individual is composed by DoF 
parameters. Joint positions Uj are generated independently 
for each intermediate goal. 


Algorithm 2 Individual Evolution (IE) 
l: procedure I ELY) 

2: individuals -G-initialize 

3: for j < n do 

4: while not termination-conditions do 

5: for each individual do 

6: Uj ■<—evol ve(DoF) 

1 : Oj <r- mental_execution((X ; ) 

8: / •<— mentaljecognitionfOj, Xj) 

9 : end for 

10: end while 

li: end for 

12 : motor_execution((7) 

13: end procedure 


In IET, Algorithm [3] joint positions Uj are generated for 
each intermediate goal after the mental execution of U[ oj-i]- 
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Let Otime be the duration of the observed action, n! is 
computed as n! = j. 

For execution, evolutionary algorithms are used to com¬ 
pute robot joint trajectories. Three different strategies for 
CGDA execution have been previously proposed: Full Tra¬ 
jectory Evolution (FTE), Individual Evolution (IE), and In¬ 
crementally Evolved Trajectories (IET) [8]. 

In FTE, Algorithm [T] each individual of the population 
is composed by DoF-n parameters, where DoF is the 
number of used degrees of freedom of the robot. The full 
robot joint trajectory U is generated attempting to reach 
all the intermediate goals simultaneously. Execution and 


Algorithm 3 Incrementally Evolved Trajectories (IET) 
l: procedure ILTLY) 

2: individuals 4—initialize 

3: for j < n do 

4: while not termination-conditions do 

5: for each individual do 

6: mental_execution(f/[ 0j _i]) 

7: Uj ■<—evol ve(DoF) 

8: Oj <r- mentaLexecution((X ; ) 

9: / •<— mentaljecognitionlOj, Xj) 

10 : end for 

ll: end while 

12 : end for 

13: m o t o r _e x e c u t i o n ((/) 

14: end procedure 























Experimental evidence from previous publications has 
determined FTE to be the strategy that requires most evalu¬ 
ations for fitness convergence [8]. The main intuition behind 
this large amount of required evaluations is that evolutionary 
algorithms are greatly affected by the size of the search 
space. In FTE, the search space is (D oF- n ) - d i m e 11 s i o n a I, 
which is proportional to the number of intermediate goals. 

IE is the strategy that requires least evaluations for fitness 
convergence of the three presented strategies, with a DoF¬ 
dimensional search space. However, joint positions are gen¬ 
erated independently for each intermediate goal, which leads 
to an inherent issue. In the case of final intermediate goals, 
this means accomplishing the majority of a final goal with 
a single robot joint position. Let a “paint” action be the use 
case, accomplishing this is not realistic. Fitness convergence 
may result in the same robot joint position for two or more 
different intermediate goals. This is not only a duplicate 
effect, but also represents a step loss or loss of time to 
achieve a different goal contributing to the general solution. 

In IET, the robot joint trajectory that has been computed 
to achieve the previous intermediate goals is executed in the 
simulation before generating each new robot joint position. 
This provides awareness of the previously achieved interme¬ 
diate goals, avoiding the inherent issue described for IE. The 
search space is DoF-dimensional as in IE. 

III. Online Evolutionary Strategies 

In the previously presented CGDA execution strategies, 
there was a mental process of execution and recognition in 
a simulated environment while monitoring fitness evolution, 
and finally motor execution was performed. In this sense, 
they can be considered offline planning algorithms. The 
general layout of an offline CGDA execution evolutionary 
strategy is summarized in Algorithm [4] where planning 
termination conditions encompass all the loop conditions. 


Algorithm 4 Offline Evolutionary Strategy 
i: procedure Offline 

2: while not planning-termination-Conditions do 

3: mental_process_loop 

4: end while 

5: motor _execution(D) 

6 : end procedure 


This paper presents a new layout for CGDA execution 
evolutionary strategies, namely online evolutionary strate¬ 
gies. The general layout of an online CGDA execution 
evolutionary strategy is summarized in Algorithm [5] 

Algorithm 5 Online Evolutionary Strategy 
i: procedure Online 

2: while not planning-termination-conditions do 

3: mentaLprocessJoop 

4: motor _execution(Dj) 

5: end while 

6 : end procedure 


Motor execution in online Algorithm [5] is of individual 
motor movements Uj, rather than the full robot joint trajec¬ 
tory U of offline Algorithm [4] This motor execution of Uj 
is performed: 

1) Once per intermediate goal. 

2) After a single mental process loop. 

The consequences are, respectively: 

1) Movements should occur n times. 

2) The repetitions of mental process loops between motor 
executions is reduced by a factor of n. 

A further consequence of (2) is that the total time ded¬ 
icated to mental processes between motor executions is no 
longer dependent of n, and is therefore independent of the 
duration of the action. 

IV. The OET Algorithm 

Online Evolved Trajectories (OET) is presented in this 
paper as an evolutionary strategy to effectively reduce com¬ 
putation times for execution inside the CGDA framework for 
real world applications. OET is a concrete implementation 
of an online evolutionary strategy for real world applications 
within the CGDA framework. The pseudcode of this strategy 
is presented in Algorithm [6] 

Algorithm 6 Online Evolved Trajectories (OET) 
l: procedure OET(X) 

2: individuals ■*—initialize 

3: while not oet-termination-conditions do 

4: P t ■<—sensor perception 

5: j 4- localization^) 

6: while not termination-conditions do 

7: for each individual do 

8: Uj+ 1 4— evolve(DoF) 

9: Oj +1 4— mentaLexecutio^L^+i) 

10: f 4— mental_recognition(Oj + i, Xj + \) 

11: end for 

12 : end while 

13: motor _execution([/j_|_i) 

14: end while 

15: end procedure 


OET termination conditions are evaluated a maximum 
of otc times, while additionally monitoring that the final 
goal has not been achieved. To achieve introducing motor 
execution within the planning algorithm loop, real world 
sensor perception and localization steps are additionally 
performed. 

A. Sensor Perception 

In the Sensor Perception step, the system extracts the 
scalar features from the real world environment sensor data. 
An updated vector P t in the m-dimensional feature space of 
X is obtained from the current state of the world, at time t, 
as in Q. 

Pt = [P0t,Plt i P2t,P3t,P‘lt-,Pmt] T 


(4) 














B. Localization 

In the Localization step, the features extracted from the 
Sensor Perception step are used to locate the intermediate 
goal that corresponds with the current environment state. The 
objective of this step is to find the intermediate goal j of the 
feature trajectory X that reduces the discrepancy between P t 
and Xj as in equation 0. 

j = arg min (|| P t - Xj\\ p ) (5) 

je\jp T ev,n] 

Where j pre v is the index of the previously accomplished 
intermediate goal, and p is the order of the norm used for 
Localization, preferably the Euclidean L2 norm. 

V. Experiments 

Three different evolutionary strategies for CGDA were 
tested in the experiments of this paper: Full Trajectory Evo¬ 
lution (FTE), Incrementally Evolved Trajectories (IET), and 
the Online Evolved Trajectories (OET). Individual Evolution 
(IE) was not used due to the inherent issue explained at the 
end of Section [II] The actions chosen for the experiments 
were the “paint” and “iron” actions, as use cases that together 
include relevant visual, kinesthetic and force features. 

The robotic platform used was TEO, a full-sized humanoid 
robot [10]. For demonstrations of the “paint” action, a 
paintbrush was attached to the left end-effector of the robot, 
and the 6 degrees of freedom of the left arm in gravity 
compensation mode were used. An ASUS Xtion PRO LIVE 
RGB-D was used to extract the percentage of painted wall. 
For the “iron” action demonstrations, an iron was installed 
as the right end-effector using custom 3D printed parts, 
and the 6 degrees of freedom of the right arm in gravity 
compensation mode were used. The CUI absolute encoders 
present in each of the joints of the robot were used to 
obtain the Cartesian position of the end-effector via forward 
kinematics, “iron” action. Finally, a JR3 force/torque sensor 
equipped in the right wrist of the robot was used to measure 
force features in the “iron” demonstrations. For all of the 
execution strategies, 3 of the 6 degrees of freedom of the 
right arm of the robot were used for the evolution, keeping 
all the other joints (including torso, legs and head) static. 

ECF [11] was used as the C++ framework for evolutionary 
computation. YARP [12] was used for internal and robot 
component communications. OpenRAVE [13] was used for 
the simulation environment. The experimental datasets and 
presented CGDA strategies have been open-sourcecQ 

Steady State Tournament (SST) has been the standard 
evolutionary algorithm used in CGDA implementations, and 
has also been used in the experiments in this paper. The 
presented strategies are situated a layer above evolutionary 
algorithms such as SST, which can be considered a back-end. 
Their comparison should not be affected by the selection of a 
specific set of back-end shared parameters. Parameters have 
been set to achieve reasonable execution times on a single 
core of a single machine. 

’https://github.com/roboticslab-uc3m/xgnitive 


Following this assumption, the SST parameters for all 
the strategies were set to a population of 10 individuals, a 
tournament size of 3 individuals, and an individual mutation 
probability of 60%. The search space of each individual was 
bounded between -15 and 100, which corresponds to the 
individual robot arm joint limits expressed in degrees. 

FTE termination conditions were to reach a zero fitness 
/ value, maximum tc = 300, or maximum tc without 
improvement in fitness tcf = 75. For IET, tc and tcf are 
scaled by n due to the outer n loop, resulting in tc = 300/ro 
and tcf = 75 /n. Finally, for OET, tc and tcf are scaled by 
otc due to the outer otc loop, resulting in tc = 300/ otc and 
tcf = 75 /otc. 

The following metrics were used within the development 
of the experiments: 

• Evaluations: The total number of passes through mental 
recognition. 

• Discrepancy: The final achieved fitness /. 

• Real Iteration Time ( RIT ): Time between two contigu¬ 
ous motor executions, as defined in ||6). 

RIT = tj - t jprev (6) 

A. Paint 

The “paint” action is a representative use case presented 
in previous work of the authors [8]. While in previous work 
the generalized “paint” action was generated synthetically 
as a linear growth from 0% to 100% of the painted portion 
of a tracked object (a wall), this feature trajectory was now 
generated from 4 user demonstrations. 

Each of the demonstrations was deliberatively performed 
following a different geometrical trajectory, as depicted in 
Figure [2] The figure also depicts the geometrical model 
generated using Gaussian Mixture Models and Gaussian 
Mixture Regression as in [1]. The method achieves painting 
43.75% of the surface, as the mixture of different geometrical 
trajectories results in a trajectory similar to their average, 
which may be or not relevant for performing the action. 



Fig. 2. The “paint” action demonstrations. The additional thick line depicts 
a failed pure geometrical approach, with K = 7 and T = 600 as in [1]. 





TABLE I 

Experimental results for the “paint” action using the three strategies presented in this paper 



Evaluations 

Discrepancy (/) 

Real Iteration Time (RIT) [s] 

Painted Wall [%] 

Strategy 


a 


<J 

M 

<J 


<J 

FTE 

1716 

231.80 

49.48 

7.40 

272.3 

68.48 

85.4 

3.6 

IET 

1153 

161.65 

54 

25.36 

143 

25.87 

72.9 

15.72 

OET 

1603.33 

20.82 

40.19 

3 

4 

0.6 

89.58 

3.6 


The average demonstration time of the “paint” action was 
Dtime = 130.2 s. Selecting a low T m in would result in an 
intractable value of n for FTE, due to the DoF ■ n size of 
its search space. T m i n = 10 s was set, resulting in n = 13 
intermediate goals for comparison of the strategies. 

The results obtained from the CGDA execution strategy 
experiments for “paint” are shown in Table |I] where averages 
and standard deviations were extracted from 3 repetitions 
of each experiment. Figure [3] shows a comparison of the 
achievement of intermediate goals with each of the strategies, 
compared to the reference generalized action obtained from 
the user demonstrations. 



Fig. 3. Generalized action obtained from user demonstrations compared 
to the intermediate goals achieved by each of the strategies. 


Similar to previous experimental evidence [8], FTE was 
the strategy that took most evaluations to converge, as a result 
of the size of the search space. Discrepancy was not the 
highest, despite the apparent lack of correlation with respect 
to the generalized action in Figure [3] This is due to the 
Dynamic Time Warping metric used in mental recognition. 
FTE is also the slowest strategy in terms of RIT, accounting 
for all the evaluations before motor execution. 

IET requires less evaluations and RIT than FTE, as a 
result of the reduced search space. However, IET has a larger 
discrepancy and achieves a lower percentage of painted wall 
than FTE. This is because IET may suffer the effects of non- 
optimal decisions for initial intermediate goals. 

OET results in more evaluations than IET in Table |I] 
as OET may perform diffferent motor executions until it 


achieves an intermediate goal. Figure [3] is a compact rep¬ 
resentation that depicts the percentage of painted wall after 
achieving each intermediate goal. OET obtained the best 
result in terms of RIT, with an average of 4 seconds be¬ 
tween real motor executions. Its final achieved percentage of 
painted wall is also the highest, and it additionally minimizes 
discrepancy. 

B. Iron 

The generalized action for the ‘iron’ action was generated 
from 4 demonstrations, depicted in Figure [4] The relevant 
features in this action were the end-effector Cartesian po¬ 
sitions and the force exerted by the iron measured on its 
vertical axis. The objective was to descend on the ironing 
board, apply 30 N force, and then ascend again. 

The figure also depicts the pure geometrical model gener¬ 
ated using Gaussian Mixture Models and Gaussian Mixture 
Regression as in [1]. In this case, while geometrically accu¬ 
rate, the measured force was close to zero. 



Fig. 4. The “iron” action demonstrations. The additional thick line depicts 
a failed pure geometrical approach, with K — 5 and T = 150 as in [1]. 


The average demonstration time of the “iron” action was 
Dtime = 28.1 s. T min = 3 s was set, resulting in n = 
9 intermediate goals for comparison of the strategies. The 
results obtained from the experiments are shown in Table [II] 
extracted from 3 repetitions of each experiment. 

For the “iron” action, FTE took the maximum amount of 
evaluations possible, composed by the initialization of the 10 
individuals, and reaching tc = 300 with this population. FTE 























TABLE II 

Experimental results for the "iron” action using the three strategies presented in this paper 



Evaluations 

Discrepancy (/) 

Real Iteration Time (RIT) [s] | 

Strategy 


(7 


a 


a 

FTE~ 

3010 

0 

0.7 

0.09 

2481 

1.73 

IET 

1588 

113.74 

0.59 

0.05 

30.30 

2.69 

OET 

1010 

400.37 

0.30 

0.07 

1.44 

0.16 


discrepancy and RIT were also the highest for this action, 
while intermediate results were obtained with IET. 

OET obtained the best overall results for the “iron” action. 
The RIT average 1.4 second mark is similar to the times of 
human mental simulations as measured in [14], 

VI. Conclusions 

A change of paradigm in evolutionary strategies for CGDA 
execution, from offline to online evolutionary strategies, is 
presented in this paper. Previously developed algorithms for 
CGDA execution subscribed to a model where planning was 
performed in mental simulations, and the final computed 
trajectory was sent to the robot for motor execution. Online 
evolutionary strategies reduce the time dedicated to mental 
processes between motor executions by shifting motor exe¬ 
cution into the planning loop. 

A concrete implementation of an online evolutionary strat¬ 
egy, Online Evolved Trajectories (OET), has additionally 
been introduced. OET is an online evolutionary strategy 
for CGDA in real world applications, including dynamic 
environments or human intervention/collaboration. It enables 
human interventions similar to the pure geometric approach 
of [15], enhanced by complementary features such as vision 
and force. These features are recorded simultaneously and 
agnostically. This is an improvement to previous literature on 
robot imitation of an “iron” action [16], where geometrical 
trajectories are learned first, and then forces are demon¬ 
strated using a separate haptic device during the execution 
of the previously learned geometrical trajectory. The results 
obtained show a notable improvement over the previous 
offline strategies used by the authors, experiencing large 
improvements not only in terms of elapsed time between 
motor executions, but also in terms of overall fitness of the 
Continuous Goal-Directed Action. 

OET has opened a new range of possible real world 
applications to the CGDA framework. The implementation 
of real world actions, where the environment experiences 
external changes, or collaborative tasks where the user helps 
the robot to perform the action, is now feasible within the 
CGDA framework. 

Future lines of research include reducing RIT , for in¬ 
stance through the use of parallelism. As RIT is minimized, 
adaptive rates of T m i n can additionally be incorporated. 
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